Public datasets
General
- research-quality data sets by Hilary Mason
- Guardian datablog all datasets
- Yahoo! Webscope - datasets from Yahoo! research
- http://www.indexmundi.com/
- http://www.freebase.com/
- http://www.google.com/publicdata/home - google public data explorer
- AWS: Public Data Sets
- http://www.gapminder.org/
- http://ckan.net/
- The Dataverse Network
- Biological database
- Wikipedia related datasets
- United States Census
- myPersonality dataset
- Wikipedia
- https://quarry.wmflabs.org - “Run SQL queries against Wikipedia & other databases”
Brain
Governments
Society
Law enforcement
- The Proceedings of the Old Bailey, 1674-1913
- http://www.fatalencounters.org/ - a database of people killed during interactions with law enforcement.
- COMPAS Recidivism dataset
Climate change on social media
Economy
- http://open.bloomberg.com/
- ReferenceUSA Business Historical Data Files
- Longitudinal Employer Household Dynamics
Environment
Food
Nutrient
- http://nutritiondata.self.com/ - nutrition data.
- USDA National Nutrient Database
Ingredients
Restaurants
Recipes
Menu
shopping
- Tesco Grocery 1.0: Aiello2020Tesco
- Instacart dataset
Geography
Population
Health
Weather
- You can use Mathematica.
- http://www.infochimps.com/tags/weather
-
http://aws.amazon.com/datasets/2759 - Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)
- http://www7.ncdc.noaa.gov/CDO/cdoselect.cmd?datasetabbv=GSOD
- http://code.google.com/p/flyontime/source/browse/trunk/analysis/README.txt
- http://ckan.net/package/search?q=temperature
Languages
- SQuAD: The Stanford Question Answering Dataset
- http://googleresearch.blogspot.com/2013/12/free-language-lessons-for-computers.html
Law and policies
- Harvard Caselaw project
- Corpus of Resolutions: UN Security Council (CR-UNSC)
- Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset
Media
Movies
- data movies - “download data from IMDB movies and parse into useful form”
- IMDB
- The Internet Movie Script Database
- Rotten tomatoes
Music
See also Music
- http://labrosa.ee.columbia.edu/millionsong/
- http://musicdatascience.com/emi-million-interview-dataset/
- The Whitburn Project
- http://www.discogs.com - music database
- http://developer.echonest.com - song characteristics (bpm, danceability, etc.)
- http://www.whosampled.com
- Billboard Top 100 Songs, 1950-2015
- MusicNet
Networks
Mobility
Roads
Religion
- https://religiondatabase.org/browse - The database of religious history
Science of science
- https://www.ogrants.org - open grants
- Accuracy of Models for Mapping the Medical Sciences
- http://sdb.cns.iu.edu
- ACL Anthology Corpus - Full Text
U.S. Congress
Web
- http://www.commoncrawl.org/
- http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset
- Wikipedia clickstream dataset